Tumma Ashwin, College of Engineering Pune, tummaak08.comp@coep.ac.in [Primary Contact]
Hukerikar Saumil, College of Engineering Pune, hukerikarsr07.comp@coep.ac.in
Nikam Akshay, College of Engineering Pune, nikamam07.comp@coep.ac.in
Attar Vahida, College of Engineering Pune, vahida.comp@coep.ac.in [Faculty Advisor]
We built our own tool, "Pandemic Data Visualizer" specifically for MC2. Using various fields provided, the tool determines most fatal syndromes, the variance in admits and deaths per week.
It identifies agegroups which are more susceptible to the disease and determines whether it affects a gender in particular.
Based on above analysis, the tool provides visual charts which will assist the health officials to analyze the illness across various countries.
Developers: Ashwin Tumma, Saumil Hukerikar, Akshay Nikam.
Download Tool Here
GNUPLOT : is a portable command-line driven graphing utility for linux and other platforms. It supports many types of plots in either 2D and 3D. It can draw using lines, points, boxes, contours, vector fields, surfaces, and various associated text. It is used for plotting various statistical results.
Version used : 4.2
Developed : March 2010
Visit Site
NetBeans IDE: A free, open-source Integrated Development Environment for Java development. The IDE was used to create the front end of Java Application.
Version: 6.8
Developed: Sun Microsystems,Inc.
Visit Site
Video:
ANSWERS:
MC2.1: Analyze the records
you have
been given to characterize the spread of the disease.
You should take into consideration symptoms
of the disease, rates,
patterns of
the onset, peak and recovery of the disease.
Health officials hope that whatever tools are developed to
analyze this
data might be available for the next epidemic outbreak.
They are looking for visualization tools that
will save them analysis time so they can react quickly.
[Note: All figures are hyperlinked to a higher resolution version, please click figure caption in order to make them more readable]. MC2.2: Compare the outbreak
across cities. Factors to consider
include timing of outbreaks, numbers of people infected and recovery
ability of
the individual cities. Identify any
anomalies you found.
The file containing statistical information about various cities is used for getting the figures associated with various parameters. Based on these comparisons we are able to determine the timing of outbreaks with respect to the peak values of the admitted and deaths. We analyse the date of outbreak for each city and the death toll on that particular day.Charts provide a visual representation of the variations in different cities.
It is seen that, Nonthaburi, Thailand was the first to witness the outbreak on 4-27-2009 while other cities started observing it after 5-23-2009 to 5-28-2009.
As a part data preprocessing, we first analyzed the dataset to identify the syndromes that were shown by the deceased across all cities. We observed patterns in the syndromes and defined the more common ones. These were classified as the fatal syndromes.
We then characterized the records on the basis of predefined age groups, gender and temporal patterns.
The tool calculates the statistical information across various cities based on the following attributes : Admits and Deaths [Both calculated in total, age wise, gender wise and regionwise]. The above results are stored in a small file which also contains the frequency of syndromes for each city. This file is now used by the tool for displaying various statistics and patterns[Textually and Graphically].
It also provides an interface for adding more cities and more patient records to the available regions for current year. It identifies the top few fatal syndromes of the city which were observed in the last year and incase a new patient indicates a majority of these syndromes, then a report indicating the details of such patients is generated and the patients can be advised for specific medical tests. The generated reports can be reviewed by the health officials on hourly/daily basis.
This will assist them to save the analysis time in finding patterns and if the number of patients in each generated report is more than a threshold value then they can react more quickly to bring the pandemic under control.
Considering the characterization of the pandemic spread,
With respect to the symptoms of the disease:
The top symptoms that lead to death were : Fever, Vomiting, Diarrhea and Abdominal Pain.
The age group that was most vulnerable : [36-49] Years. Least affected :[0-12] Years age group.
Gender classification shows that, both male and female, were affected more or less equally.
With respect to the mortality rates, we see that the cities Nonthaburi, Thailand nad Mersin, Turkey have the highest mortality while Nairobi, Kenya and Aleppo, Syria seem to have lesser mortality rates.
By observing temporal patterns of the onset we see that almost all the cities have a rise till their peak death tolls which is between 5-23-2009 and 5-28-2009 for most cities.
Values of peaks of both admits and deaths are displayed for each city. Karachi, Pakistan has the highest peaks and the recovery of Beirut, Lebanon seems to be the best.
Charts are supported in the tool for the above claims.
Fig. 1: Generation of Report of New Patients who show previously fatal syndromes
Fig. 2: Fatal Syndromes in a city
Fig. 3: Agewise Infection
Fig. 4: Weekly Variance Plot.
Fig. 5: Overall Summary
The number of people infected is obtained by number of records provided in the dataset for each city and this is calculated with respect to gender as well as predefined age groups. Since the dataset makes no mention about the total population of the city, in our case we assume only the admit and death tolls. On comparing the values of cities, we observe, Karachi, Pakistan has the maximum toll while Nonthaburi, Thailand the least. But, the death percentage amongst the admits is very high for Nairobi, Kenya and Aleppo, Syria.
The recovery ability is calculated on the basis of mortality, lower infected rate, quicker drop rate. The comparitive results are presented textually as well as in graphical charts and considering the above factors for recovery ability we can draw following conclusions : The order in which the cities stand from Best to Worst are: Beirut, Nonthaburi, Tabriz, Jedda, Mersin, Tolima, Aden, Barcelona, Nairobi, Karachi, Aleppo.
Weekly Admits and Deaths Charts show that Karachi, Pakistan has a very high toll of admits as well as deaths, while others are proportionally equal. If mortality is considered then Nonthaburi, Thailand and Mersin, Turkey show almost 100% rate while it drops below 97% in case of Nairobi, Kenya and Aleppo, Syria.
We present an overall summary that will aid medical practitioners to take appropriate decisions and the health officials to have a summary of the statistical analysis.
Certian Anamolies were detected while comparison:
[Note: Since, the dataset provides no benchmark for anamoly detection, we assume that the cumulative patterns(considered an average value) will act as benchmarks in our case.]
1. The peaks of admits are much earlier (5-27-2009) in Nonthaburi, Thailand while others' pattern show it much later after 5-23-2009.
2. When weekly admits are considered, all the cities show progressive decrease in admits after the peak value, but Jedda, Saudi Arabia again reaches to a value very close to its peak and then culminates. This pattern is observed only for this city.
3. The city of Karachi, Pakistan has a very high toll of admits and deaths even when compared with the average value of all the cities. The deviation is very large.
Fig. 6: Timing of Outbreaks
Fig. 7: Number of People Infected
Fig. 8: Weekly Admits
Fig. 9: Mortality Rates
Fig. 9: Chart of Anamoly Detection